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Abstract. This paper reports on the potential of Oral Elicited Imitation (OEI) as a 
format for output practice, building on an analysis of picture-matching and spoken 
data collected from 36 university-level learners of German as a second language (L2) 
in a web-based assessment task inspired by Input Processing (VanPatten, 2004). The 
design and development of OEI for output practice faces two key challenges: learners 
must be engaged in meaningful language processing rather than in mere repetition 
of oral stimuli, and the task must eventually provide individualized and qualitative 
corrective feedback that helps learners to notice gaps between their interlanguage 
and the target language. Results show that learners attended to meaning and that 
a commercially available speech recognition tool was able to transcribe learner 
speech remarkably well. 

Keywords: computer-assisted practice, elicited imitation, speaking, morphosyntax, 
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1. Introduction 

In many instructed foreign language learning contexts, opportunities for spoken 
practice with individualized feedback are scarce. A candidate task for such practice 
is OEI. In its most basic form, OEI requires learners to repeat oral stimuli (typically 
sentences) as exactly as possible. It has been mainly used for language assessment, 
building on the assumption that learners can only accurately repeat sentences 
they have comprehended and parsed, and will access corresponding mental 
representations (internalized lexicons and grammars) to reproduce the stimuli. 
Previous research has shown that OEI can measure implicit knowledge (Erlam, 
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2009) and oral proficiency (Tracy-Ventura, McManus, Norris, & Ortega, 2014). 
Additionally, language technology researchers have shown that OE1 tests can be 
automated through Automatic Speech Recognition (ASR) technology (Graham 
et ah, 2008). These findings open opportunities for computer-assisted oral output 
practice. 

However, two preconditions need to be fulfilled for OEI to be a useful task for 
output practice. First, OEI-based practice must engage learners in meaningful 
language processing, otherwise the risk is that learners simply parrot what they 
hear (Erlam, 2009). Therefore, the instructional design of OEI-based practice must 
rely on sound principles of L2 teaching and learning. Secondly, the task must 
give automated corrective feedback that can help learners to notice gaps between 
their speech and the target language. This requires language technology that can 
accurately recognize learner speech and detect mismatches between learner output 
and the target language. 

The current study investigated the potential of automating OEI for output practice in 
German L2 while taking these two requirements into account. Research questions 
include: 


• Did the participants attend to meaning? 

• How accurately does a state-of-the-art ASR transcribe the participants’ 
production? 


2. Method 

We designed and developed a web-based and meaning-focused OEI test 
inspired by Erlam’s (2009) implementation of OEI as well as by VanPatten’s 
(2004) theoretical framework of Input Processing. Learners were required to 
choose between competing visual representations of the oral stimulus before 
speaking. Immediately after the oral stimulus, two pictures were shown, 
visualizing alternative interpretations of the stimulus (see Figure 1). This served 
two purposes: pairing oral stimuli with pictures would stimulate syntactic and 
semantic processing. Secondly, inserting the picture-matching between the 
listening and speaking phases results in a time interval, potentially ‘flushing’ 
learners’ auditory working memory. This is a critical design choice if we do not 
want learners to draw on their short-term memory when speaking but rather on 
their internalized lexicon and grammar. 


87 




Frederik Cornillie, Kristof Baten, and Dirk De Hertog 


Figure 1. Pictures representing competing interpretations of the ungrammatical 
stimulus *Der Mann gibt die Frau den Apfel 
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Picture-matching data and spoken data were collected from 36 university-level 
learners of German L2. Eleven students were in their second year of the academic 
bachelor program in languages, ten in the third bachelor year, and 15 in the 
master’s program. Their level of proficiency according to the Common European 
Framework of Reference for languages (CEFR) was estimated between B2 and Cl. 
Eleven students had had an Erasmus stay in a German-speaking country. 

Each student was presented with 48 stimuli (comprising grammatical and 
ungrammatical sentences) that aimed to assess knowledge of case marking and 
word order; 16 sentences focused on transitives (e.g. DerHund verfolgt den Mann, 
‘the dog chases the man’), 16 on ditransitives (e.g. Die Lehrerin schenkt dem 
Direktor die Blumen, ‘the teacher gives the headmaster flowers’), and another 16 
on prepositional phrases (e.g. DerMann spaziert durch den Tunnel, ‘the man walks 
through the tunnel’). Length of the stimuli ranged between five and eight words. 
The software logged students’ interpretations of the sentences; their speech was 
recorded with Audacity. Students wore headsets throughout the experiment. 
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After the experiment, the speech data were transcribed manually and through Google’s 
ASR software (Cloud Speech API). This ASR was selected because it is known to work 
relatively well, can be easily plugged into applications through its API, and recognizes 
over 80 languages and variants, allowing to scale the solution to other languages. 

For the first research question, we analyzed scores on the picture-matching tasks 
(assessing meaning recognition). If the participants did not attend to meaning during 
the OEI task, average scores at about the chance level (50%) could be expected. 
Also, higher-level students were hypothesized to have higher scores. Additionally, 
we assessed linguistic variation in the learners’ responses through manual inspection 
of the corpus. If semantic variation occurred in addition to (morpho-)syntactic 
variation, then chances were higher that the learners had also processed the stimuli 
for meaning. 

For the second research question, distance metrics were computed between the 
manually and automatically transcribed data. We used Levenshtein distance at 
the word level, which reflects the number of changed, inserted, or deleted words 
(for each response). As ASR performance may have been affected by the learners’ 
linguistic level, mean Levenshtein distance per student was regressed on the 
students’ year of study and their Erasmus experience. 


3. Results 

The average percentage of correct responses overall was 78%. Students in the 
second bachelor year scored 78.1% on average, in the third year 74.4%, and 
students in the master’s program 80.3%. A one-way ANOVA proved the difference 
between the groups insignificant (7 7 (2,33)=0.88, j9=0.42). 

Manual inspection of the corpus revealed instances of semantic variation (e.g. Der 
Mann ist gegen den Baum gefahren > Der Mann ist gegen den Baum gefalleri), 
morphological variation (e.g. Die Lehrerin schenkt dem Direktor die Blumen > 
*Die Lehrerin schenkt den Direktor den Blumen), syntactic variation (e.g. Dem 
Direktor schenkt die Lehrerin die Blumen > Die Lehrerin schenkt dem Direktor 
die Blumen), and combinations of these (e.g. Dem Sohn zeigt der Vater die Brille 
> *Der Vater schenkt der Junge den Junge die Brille). In addition, there were 
instances of self-correction (e.g. Das Madchen kommt aus der Shop - dem Shop), 
disfluencies (e.g. Der Doktor verklauf verkauft dem Clown das Buck) and multiple 
repetitions of the sentence, with or without self-corrections (e.g. Die Frau gibt den 
Mann den Apfel. Die Frau giht dem Mann den Apfel.) 
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Levenshtein distance between the manually and automatically transcribed 
responses ranged between 0 (perfect recognition) and 6 (six words changed, 
inserted, or deleted). The mean was 0.55 and the median was 0. Out of a total 
number of 1487 automatically transcribed responses, 979 had zero edit distance. 
The regression analysis did not reveal any effects of year of study or Erasmus 
experience (F(3,27)=0.3917, p=.16, adjusted R---0.06). 


4. Discussion and conclusions 

This study aimed to assess the potential of OEI for output practice in German L2. 
First, high scores on the picture matching task as well as instances of semantic 
variation in learner speech suggest that the assessment task stimulated meaningful 
language processing, even if it was constrained and rather form-focused. 

Secondly, Google’s ASR service performed remarkably well on the non-native 
speech. These results bode well for the further development of the task. Elowever, 
it must be taken into account that the study was limited to higher-level students 
whose mother tongue is phonetically rather similar to the target language; lower- 
level students from different mother tongue backgrounds may speak less fluently 
or in more accented ways, potentially affecting ASR performance. 

The next step will be to go beyond simple distance metrics and automatically detect 
the different types of linguistic variation in order to develop feedback modules for 
an implementation of this task for L2 practice. This will be done with a view to 
conducting an experiment that aims to examine the effect of automated feedback 
on accuracy, fluency, and perhaps complexity (lexical diversity). 
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